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Abstract 



VHPOP is a partial order causal link (POCL) planner loosely based on UCPOP. It 
draws from the experience gained in the early to mid 1990's on flaw selection strategies 
for POCL planning, and combines this with more recent developments in the field of do- 
main independent planning such as distance based heuristics and reachability analysis. We 
present an adaptation of the additive heuristic for plan space planning, and modify it to 
account for possible reuse of existing actions in a plan. We also propose a large set of novel 
flaw selection strategies, and show how these can help us solve more problems than previ- 
ously possible by POCL planners. VHPOP also supports planning with durative actions 
by incorporating standard techniques for temporal constraint reasoning. We demonstrate 
that the same heuristic techniques used to boost the performance of classical POCL plan- 
ning can be effective in domains with durative actions as well. The result is a versatile 
heuristic POCL planner competitive with established CSP-based and heuristic state space 
planners. 

1. Introduction 

During the first half of the last decade, much of the research in domain independent plan 
generation focused on partial order causal link (POCL) planners. The two dominant POCL 
planners were SNLP (McAllester & Rosenblitt, 1991) and UCPOP (Penberthy & Weld, 
1992), and a large part of the planning research was aimed at scaling up these two planners. 
The most promising attempts at making POCL planning practical involved alternative flaw 
selection strategies (Peot & Smith, 1993; Joslin & Pollack, 1994; Schubert &: Gerevini, 1995; 
Williamson & Hanks, 1996; Pollack, Joslin, & Paolucci, 1997). A flaw in POCL planning is 
either an unlinked precondition (called open condition) for an action, or a threatened causal 
link. While flaw selection is not a backtracking point in the search through plan space for 
a complete plan, the order in which flaws are resolved can have a dramatic effect on the 
number of plans searched before a solution is found. The role of flaw selection in POCL 
planning is similar to the role of variable selection in constraint programming. 

There have been dramatic advances in domain independent planning in the past seven 
years, but the focus has shifted from POCL planning to CSP-based planning algorithms 
(Blum &: Furst, 1997; Kautz & Selman, 1996) and state space planning as heuristic search 
(Bonet &: Geffner, 2001b; Hoffmann & Nebel, 2001). Recently, Nguyen and Kambham- 
pati (2001) showed that with techniques such as distance based heuristics and reacha- 
bility analysis — largely responsible for the efficiency of today's best domain independent 
planners — can also be used to dramatically improve the efficiency of POCL planners, thereby 
initiating a revival of this previously popular approach to domain independent planning. We 
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have drawn from their experience, as well as from experience with flaw selection strategies 
from the glory-days of POCL planning, when developing the Versatile Heuristic Partial 
Order Planner (VHPOP), and the result is a POCL planner that was able to compete 
well with CSP-based and heuristic state space planners at the 3rd International Planning 
Competition (IPC3). 

We have previously (Younes & Simmons, 2002) adapted the additive heuristic — proposed 
by Bonet, Loerincs, and Geffner (1997) and used in HSP (Bonet & Geffner, 2001b) — for 
plan space search. In this paper we present a variation of the additive heuristic for POCL 
planning that accounts for possible reuse of actions that are already part of a plan. We 
show that this accounting for positive interaction often results in a more effective plan 
ranking heuristic. We also present ablation studies that demonstrate the effectiveness of 
a tie-breaking heuristic based on estimated planning effort (defined as the total number 
of open conditions, current and future, that need to be resolved in order to complete a 
partial plan) . The results show that using this tie-breaking heuristic almost always improves 
planner performance. 

While the heuristics implemented in VHPOP can work with either ground (fully instan- 
tiated) or lifted (partially instantiated) actions, we chose to work only with ground actions 
at IPC3. We have shown elsewhere (Younes & Simmons, 2002) that planning with lifted 
actions can help reduce the branching factor of the search space compared to using ground 
actions, and that this reduction sometimes is large enough to compensate for the added 
complexity that comes with having to keep track of variable bindings. Further studies are 
needed, however, to gain a better understanding of the circumstances under which planning 
with lifted actions is beneficial. 

VHPOP efficiently implements all the common flaw selection strategies, such as DUnf 
and DSep (Peot & Smith, 1993), LCFR (Joslin & Pollack, 1994), and ZLIFO (Schubert & 
Gerevini, 1995). In addition to these, we introduce numerous novel flaw selection strategies 
in this paper, of which four were used at IPC3. While we do not claim to have resolved 
the issue of global versus local flaw selection — manifested by the conflicting claims made 
by Gerevini and Schubert (1996) on the one hand, and Pollack et al. (1997) on the other 
about the most efficient way to reduce the number of searched plans in POCL planning — 
we show that by combining ideas from both ZLIFO and LCFR we can get very efficient 
flaw selection strategies. Other novel flaw selection strategies introduced in this paper 
are based on heuristic cost, an idea previously explored by Ghallab and Laruelle (1994). 
We also introduce "conflict-driven" flaw selection strategies that aim to expose possible 
inconsistencies early in the search, and we show that strategies based on this idea can be 
effective in domains previously thought to be particularly difficult for POCL planners. 

Ideally, we would like to have one single flaw selection strategy that dominates all other 
strategies in terms of number of solved problems. We have yet to discover such a universal 
strategy, so instead we use a technique previously explored by Howe, Dahlman, Hansen, 
Scheetz, and von Mayrhauser (1999) for combining the strengths of different planning al- 
gorithms. The idea is to run several planners concurrently, and Howe et al. showed that 
by doing so more problems can be solved than by running any single planner. In VHPOP 
we use the same basic POCL planning algorithm in all instances, but we use different flaw 
selection strategies concurrently. 
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VHPOP extends the capabilities of classical POCL planners by also supporting planning 
with durative actions. This is accomplished by adding a simple temporal network (STN) 
(Dechter, Meiri, & Pearl, 1991) to the regular plan representation of a POCL planner. The 
STN records temporal constraints between actions in a plan, and supersedes the simple 
ordering constraints usually recorded by POCL planners. The use of STNs permits actions 
with interval constraints on the duration (a feature that was not utilized by any of the 
domains at IPC3 that VHPOP could handle). The approach we take to temporal POCL 
planning is essentially the same as the constraint-based interval approach described by 
Smith, Frank, and Jonsson (2000), and similar techniques for handling durative actions 
in a POCL framework can be traced back at least to Vere's DEVISER (Vere, 1983). 
Our contribution to temporal POCL planning is demonstrating that the same heuristic 
techniques shown to boost the performance of classical POCL planning can also be effective 
in domains with durative actions, validating the feasibility of the POCL paradigm for 
temporal planning on a larger set of benchmark problems than has been done before. 

2. Basic POCL Planning Algorithm 

We briefly review how POCL planners work, and introduce the terminology used throughout 
this paper. For a thorough introduction to POCL planning, we refer the reader to the 
tutorial on least commitment planning by Weld (1994). 

A (partial) plan can be represented by a tuple (A, C, O, £>), where A is a set of actions, 
C a set of causal links, O a set of ordering constraints defining a partial order on the set A, 
and B a set of binding constraints on the action parameters (B = if ground actions are 
used). Each action a is an instance of some action schema A in the planning domain, and 
a plan can contain multiple instances of the same action schema. A causal link, a« — ^->aj, 
represents a commitment by the planner that precondition q of action aj is to be fulfilled 
by an effect of action a%. 

An open condition, -^-Kij, is a precondition q of action a« that has not yet been linked 
to an effect of another action. An unsafe link (or threat) is a causal link, cij — ^a,j, whose 
condition q unifies with the negation of an effect of an action that could possibly be ordered 
between cij and aj . The set of flaws of a plan ir is the union of open conditions and unsafe 
links: JF(tt) = OC(vr) UU£(tt). 

A POCL planner searches for a solution to a planning problem in the space of partial 
plans by trying to resolve all flaws in a plan. Algorithm 1 shows a generic procedure for 
POCL planning that given a planning problem returns a plan solving the problem (or failure 
if the given problem lacks a solution). A planning problem is a set of initial conditions T 
and a set of goals Q, and is represented by an initial plan with two dummy actions ao -< aoo, 
where the effects of ao represent the initial conditions of the problem and the preconditions 
of Oqo represent the goals of the problem. The procedure Make-Initial-Plan used in 
Algorithm 1 returns the plan ({ao, aoo}, 0, {ao -< aoo},0). A set V of generated, but not 
yet visited, partial plans is kept. At each stage in the planning process, a plan is selected 
and removed from V, and then a flaw is selected for that plan. All possible refinements 
resolving the flaw (returned by the procedure Refinements) are added to V, and the 
process continues until V is empty (indicates failure) or a plan without flaws is found. 
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Algorithm 1 Generic POCL planning algorithm as formulated by Williamson and Hanks 
(1996). 

Find-Plan(X,£) 
V 4= {Make-Initial-Plan(J, G)} 
while V + do 

7r -4= some element of V > plan selection 

V <= V \ {vr} 

if jF(7r) = then 

return 7r 
else 

/ -4= some element of > flaw selection 

p <t=PU Refinements(7t, /) 
return failure (problem lacks solution) 



An open condition, — >aj, can be resolved by linking q to the effect of an existing or 
new action. An unsafe link, a« — J -+a,j threatened by the effect p of action a^, can be resolved 
by either ordering before Oj (demotion), or by ordering a& after aj (promotion). If we 
use lifted actions instead of ground actions, a threat can also be resolved by adding binding 
constraints so that p and -*q cannot be unified (separation). 

3. Search Control 

In the search for a complete plan, we first select a plan to work on, and given a plan we 
select a flaw to repair. These two choice points are indicated in Algorithm 1. Making 
an informed choice in both these cases is essential for good planner performance, and the 
following is a presentation of how these choices are made in VHPOP. 

3.1 Plan Selection Heuristic 

VHPOP uses the A* algorithm (Hart, Nilsson, & Raphael, 1968) to search through plan 
space. The A* algorithm requires a search node evaluation function f(n) = g(n) + h(n), 
where g(n) is the cost of getting to n from the start node (initial plan) and h(n) is the 
estimated remaining cost of reaching a goal node (complete plan). We want to find plans 
containing few actions, so we take the cost of a plan to be the number of actions in it. For 
a plan it = (A, C, O, B) we therefore have g(ir) = \ A\. 

The original implementations of SNLP and UCPOP used hf(ir) = |.F(7r)| as the heuris- 
tic cost function, i.e. the number of flaws in a plan. Schubert and Gerevini (1995) consider 
alternatives for hf(ir), and present empirical data showing that just counting the open con- 
ditions (h oc (Tr) = \OC(tt)\) often gives better results. A big problem, however, with using 
the number of open conditions as an estimate of the number of actions that needs to be 
added is that it assumes a uniform cost per open condition. It ignores the fact that some 
open conditions can be linked to existing actions (thus requiring no additional actions), 
while other open conditions can be resolved only by adding a whole chain of actions (thus 
requiring more than one action). 
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Recent work in heuristic search planning has resulted in more informed heuristic cost 
functions for state space planners. We have in previous work (Younes & Simmons, 2002) 
adapted the additive heuristic — first proposed by Bonet et al. (1997) and subsequently used 
in HSP (Bonet & Geffner, 2001b) — for plan space search and also extended it to handle 
negated and disjunctive preconditions of actions as well as actions with conditional effects 
and lifted actions. The heuristic cost function used by VHPOP at IPC3 was a variation of 
the additive heuristic where some reuse of actions is taken into account, coupled with the 
tie-breaking rank (introduced in Younes & Simmons, 2002) based on estimated remaining 
planning effort. 

3.1.1 The Additive Heuristic for POCL Planning 

The key assumption behind the additive heuristic is subgoal independence. We give a 
recursive definition of the additive heuristic for POCL planning, starting at the level of 
literals and working towards a definition of heuristic cost for a partial plan. 

Given a literal q, let QA(q) be the set of ground actions having an effect that unifies 
with q. The cost of the literal q can then be defined as 



if q unifies with a literal that holds initially 

mm aeg^( 9 ) Kdd(a) if QA(q) ^ 
oo otherwise 



A positive literal q holds initially if it is part of the initial conditions. A negative literal —>q 
holds initially if q is not part of the initial conditions (the closed- world assumption). The 
cost of an action a is 

fradd(a) = 1 + Kdd(Prec(a)), 

where Prec(a) is a propositional formula in negation normal form representing the precon- 
ditions of action a. A propositional formula is in negation normal form if negations only 
occur at the level of literals. Any propositional formula can be transformed into negation 
normal form, and this is done for action preconditions by VHPOP while parsing the domain 
description file. 

Existentially quantified variables in an action precondition can be treated as additional 
parameters of the action. The cost of an existentially quantified precondition can then 
simply be defined as follows: 

/l ad d(3x.(/>) = /l a dd(0) 

We can deal with universally quantified preconditions by making them fully instantiated 
in a preprocessing phase, so in order to complete the definition of heuristic cost for action 
preconditions we only need to add definitions for the heuristic cost of conjunctions and 
disjunctions. The cost of a conjunction is the sum of the cost of the conjuncts: 



Kdd(f\4>i) = X^add(<^) 



The summation in the above formula is what gives the additive heuristic its name. The 
definition is based on the assumption that subgoals are independent, which can lead to 
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overestimation of the actual cost of a conjunctive goal (i.e. the heuristic is not admissible). 
The cost of a disjunction is taken to be the cost of the disjunct with minimal cost: 

Kdd(\/ 4>i) = min/i add (>i) 

i 

The additive heuristic cost function for POCL plans can now be defined as follows: 

/iadd(vr) = ^2 K dd (q) 

As with the cost function for conjunction, the above definition can easily lead to overesti- 
mation of the number of actions needed to complete a plan, since possible reuse is ignored. 
We propose a remedy for this below. 

The cost of ground literals can be efficiently computed through dynamic programming. 
We take conditional effects into account in the cost computation. If the effect q is condi- 
tioned by p in action a, we add /i a dd(p) to the cost of achieving q with a. We only need 
to compute the cost for ground literals once during a preprocessing phase, leaving little 
overhead for evaluating plans during the planning phase. When working with lifted actions, 
there is extra overhead for unification. It should also be noted that all lifted literals are 
independently matched to ground literals without considering interactions between open 
conditions of the same action. For example, two preconditions (a ?x) and (b ?x) of the 
same action can be unified to ground literals with different matchings for the variable ?x. 

3.1.2 Accounting for Positive Interaction 

The additive heuristic does not take reuse of actions (other than the dummy action ao) 
into account, so it often overestimates the actual number of actions needed to complete 
a plan. The need to take positive interaction into account in order to obtain a more 
accurate heuristic estimate has been recognized in both state space planning (Nguyen & 
Kambhampati, 2000; Hoffmann & Nebel, 2001; Refanidis & Vlahavas, 2001) and plan space 
planning (Nguyen & Kambhampati, 2001). For IPC3 we used a slight modification of the 
additive heuristic to address the issue of action reuse: 

if 3aj e A s.t. an effect of cij unifies with q 

and a,i ~< a,j 
^add(g) otherwise 

The underlying assumption for this heuristic cost function is that an open condition — > m 
that can possibly be resolved by linking to the effect of an existing action cij will not give 
rise to a new action when resolved. This can of course lead to an overly optimistic estimate 
of the number of actions required to complete the plan. The modified heuristic is still not 
admissible, however, since the same cost value as before is used for open conditions that 
cannot be linked to effects of existing actions. In other words, we only account for possible 
reuse of existing actions and not potential actions. 

To illustrate the difference between /i a dd( 7r ) an d ^- a dd( 7r ) cons ider a planning domain 
with two action schemas A\ and A2, where A\ has no preconditions and A2 has a single 
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Table 1: Planning times in seconds using different flaw selection strategies for a selection of 
problems in the DriverLog, ZenoTravel, and Satellite domains, showing the impact 
of taking reuse into account in the plan ranking heuristic. A dash (-) means that 
the planner ran out of memory (512Mb). 



precondition q. Assume that q can only be achieved through an action instance of A±. The 
heuristic cost for the literal q is therefore 1 according to the additive heuristic. Consider 
now a plan ir with two unordered actions a\ and a2 (a« being an instance of action schema 
Ai) and a single open condition -^->a2. We have /iadd(?r) = ^add(^) = 1 corresponding to 
the addition of a new instance of action schema A\ to achieve q, but ^add( 7r ) = because 
there is an action (viz. a±) that is not ordered after a2 and has an effect that unifies with q. 
Table 1 shows that taking reuse into account can have a significant impact on planning time 
in practice. The modified additive heuristic h T add clearly dominates /i a dd in the DriverLog 
and ZenoTravel domains despite incurring a higher overhead per generated plan. The results 
in the Satellite domain are more mixed, with /i ac jd having a slight edge overall. We show 
planning times for the four flaw selection strategies that were used by VHPOP at IPC3. 
These and other novel flaw selection strategies are discussed in detail in Section 3.2.2. 

Hoffmann and Nebel (2001) describe the FF heuristic that takes positive interaction be- 
tween actions into account by extracting a plan from the relaxed planning graph 1 , and argue 
that the accounting of action reuse is one of the contributing factors to FF's performance 
advantage over HSP. The FF heuristic can take reuse of potential actions into account, and 
not just existing actions as is the case with our modified additive heuristic. This should 
result in a better estimate of actual plan cost, but requires that a plan is extracted from the 



1. A relaxed planning graph is a planning graph with no action pairs marked as mutex. 
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relaxed planning graph for every search node, which could be costly. It would be interesting 
to see how the FF heuristic performs if used in a plan space planner. 

The heuristic cost function used in RePOP (Nguyen & Kambhampati, 2001), a heuristic 
partial order planner working solely with ground actions, is defined using a serial planning 
graph. 2 The heuristic is similar in spirit to the FF heuristic, and can like the FF heuristic 
take reuse of potential actions into account. The RePOP heuristic also takes into account 
reuse of existing actions, but seemingly without considering ordering constraints, which 
is something we do in our modified additive heuristic. Furthermore, our /i^ dd heuristic 
always takes reuse of any existing actions that achieves a literal q into account, while the 
RePOP heuristic only considers an existing action if it happens to be selected from the 
serial planning graph as the the action that achieves q. The results in Table 2 indicate that 
the RePOP heuristic may be less effective than the additive heuristic (with and without 
reuse) in certain domains. 

3.1.3 Estimating Remaining Effort 

Not only do we want to find plans consisting of few actions, but we also want to do so 
exploring as few plans as possible. Schubert and Gerevini (1995) suggest that the number 
of open conditions can be useful as an estimate of the number of refinement steps needed 
to complete a plan. We take this idea a bit further. 

When computing the heuristic cost of a literal, we also record the estimated effort of 
achieving the literal. A literal that is achieved through the initial conditions has estimated 
effort 1 (corresponding to the work of adding a causal link to the plan). If the cost of a 
literal comes from an action a, the estimated effort for the literal is the estimated effort for 
the preconditions of a, plus 1 for linking to a. Finally, the estimated effort of a conjunction 
is the sum of the estimated effort of the conjuncts, while the estimated effort of a disjunction 
is the estimated effort of the disjunct with minimal cost (not effort). 

The main difference between heuristic cost and estimated effort of a plan is that esti- 
mated effort assigns the value 1 instead of to literals that can be unified with an initial 
condition. To illustrate the difference, consider a plan tt with two open conditions p and q 
that both hold in the initial conditions. The heuristic cost for tt is 0, while the estimated 
effort is 2. The estimated effort is basically a heuristic estimate of the total number of open 
conditions that will have to be resolved before a complete plan is found, and it is used as a 
tie-breaker between two plans tt and tt' in case /(tt) = /(tt'). Consider an alternative plan 
tt' with the same number of actions as tt but with a single open condition p. This plan 
has heuristic cost as does the plan tt, but the estimated effort is only 1, so tt' would be 
selected first if estimated effort is used as a tie-breaker. Table 2 shows that using estimated 
effort as a tie-breaker can have a notable impact on planner performance for both h^d and 
^add- Estimated effort helps reduce the number of generated and explored plans in all cases 
but one (when using h r add on problem rocket-ext-a) . 

Estimated effort is not only useful as a plan ranking heuristic, but also for heuristic flaw 
selection as we will soon see. 



2. A serial planning graph is a planning graph with every pair of non-noop actions at the same level marked 
as mutex. 
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Table 2: The number of generated/explored plans for and h r add both without and with 
estimated effort as a tie-breaker. The RePOP column contains the numbers re- 
ported by Nguyen and Kambhampati (2001) for RePOP using the serial planning 
graph heuristic. These numbers are included only for the purpose of showing that 
there seems to be a qualitative difference between the RePOP heuristic and the 
heuristics used by VHPOP. An asterisk (*) means that no solution was found 
after generating 100000 plans. Flaws were selected in LIFO order. 



3.2 Flaw Selection Strategies 

In the original implementations of SNLP and UCPOP threats are selected before open 
conditions. When there is more than one threat (or open condition) that can be selected, 
the one added last is selected first (LIFO order) . Several alternative flaw selection strategies 
have been proposed in an attempt to improve the performance exhibited by POCL planners. 

Peot and Smith (1993) show that the number of searched plans can be reduced by 
delaying the resolution of some threats. The most successful of the proposed delay strategies 
are DSep, which delays threats that can be resolved through separation, and DUnf, which 
delays threats that can be resolved in more than one way. 

Joslin and Pollack (1994) suggest that all flaws should be treated uniformly, and that 
the flaw with the least number of refinements should be selected first. Their flaw selection 
strategy, LCFR, can be viewed as an instance of the most-constrained- variable heuristic used 
in simple search rearrangement backtracking (Bitner & Reingold, 1975; Purdom, 1983). The 
main disadvantage with LCFR is that computing the repair cost for every flaw can incur a 
large overhead for flaw selection. This can lead to longer planning times compared to when 
using the default UCPOP strategy, even if the number of search nodes is significantly 
smaller with LCFR. A clever implementation of LCFR can, however, reduce the overhead 
for flaw selection considerably. 

Schubert and Gerevini (1995) argue that a LIFO strategy for selecting open conditions 
helps the planner maintain focus on the achievement of a particular high-level goal. Their 
ZLIFO strategy is a variation of the DSep strategy, with the difference being that open 
conditions that cannot be resolved, or can be resolved in only one way, are selected before 
open conditions that can be resolved in more than one way. Gerevini and Schubert (1996) 
present results indicating that ZLIFO often needs to generate fewer plans than LCFR 
before a solution is found, and has a smaller overhead for flaw selection. These results are 
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disputed by Pollack et al. (1997). They instead attribute much of the power of ZLIFO to 
its delaying of separable threats, and propose a variation of LCFR, LCFR-DSep, that also 
delays separable threats. Since we chose to work with ground actions at IPC3, separability 
was not an issue for us. 

3.2.1 Notation for Specifying Flaw Selection Strategies 

In order to better understand the differences between various flaw selection strategies, and 
to simplify comparative studies, Pollack et al. (1997) proposed a unifying notation for 
specifying flaw selection strategies. We adopt their notation with only slight modifications. 

A flaw selection strategy is an ordered list of selection criteria. Each selection criterion 
is of the form 

{flaw types} <max re finements orderin 9 criterion, 

and applies to flaws of the given types that can be resolved in at most max refinements 
ways. If there is no limit on the number of refinements, we simply write 

{flaw types} ordering criterion. 

The ordering criterion is used to order flaws that the selection criterion applies to. LIFO 
order is used if the ordering criterion cannot be used to distinguish two or more flaws. 

Pollack et al. define the flaw types "o" (open condition), "n" (non-separable threat), 
and "s" (separable threat). They also define the ordering criteria "LIFO", "FIFO", "R" 
(random), "LR" 3 (least refinements first), and "New". The last one applies only to open 
conditions, and gives preference to open conditions that can be resolved by adding a new 
action. The rest apply to both open conditions and threats. 

Flaws are matched with selection criteria, and it is required for completeness that every 
flaw matches at least one selection criterion in a flaw selection strategy. The flaw that 
matches the earliest selection criterion, and is ordered before any other flaws matching the 
same criterion (according to the ordering criterion), is the flaw that gets selected by the 
flaw selection strategy. Note that we do not always need to test all flaws. If, for example, 
the first selection criterion is {n, s}LIFO, and we have found a threat, then we do not need 
to consider any other flaws for selection. 

Using this notation, we can specify many different flaw selection strategies in a concise 
manner. Table 3 specifies the flaw selection strategies mentioned earlier. A summary of 
flaw types recognized by VHP OP, including three new flaw types defined below, is given 
in Table 4. 

3.2.2 New Flaw Selection Strategies 

We now propose several additional flaw types and ordering criteria, and use these in combi- 
nation with the previous ones to obtain some novel flaw selection strategies. Four of these 
new flaw selection strategies were used at IPC3 and contributed to the success of VHPOP 
at that event. 

3. The original notation for this ordering criterion is LC for "least (repair) cost", where the repair cost 
is defined to be the number of refinements. Because we introduce a new ordering criterion based on 
heuristic cost, we choose to rename this ordering criterion. 
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Name 


£ ' ■ I . 1 "... 

specification 


UCPOP 


{n, s}LIFO / {o}LIFO 


DSep 


{n}LIFO / {o}LIFO / {s}LIFO 


DUnf 


{n,s}< LIFO / {n,s}<iLIFO / {o}LIFO / {n,s}LIFO 


LCFR 


{n, s, o}LR 


LCFR-DSep 


{n,o}LR/ {s}LR 


ZLIFO 


{n}LIFO / {o}< LIFO / {o}<iNew / {o}LIFO / {s}LIFO 



Table 3: A few of the flaw selection strategies previously proposed in the planning literature. 



Flaw Type 


Description 


n 


non-separable threat 


s 


separable threat 


o 


open condition 


t 


static open condition 


1 


local open condition 


u 


unsafe open condition 



Table 4: Summary of flaw types recognized by VHPOP. 

Early Commitment through Flaw Selection. We have shown (Younes Sz Simmons, 
2002) that giving priority to static open conditions can be beneficial when planning with 
lifted actions. Introducing a new flaw type, "t" , representing static open conditions, we can 
specify this flaw selection strategy as follows: 

Static-First {t}LIFO / {n, s}LIFO / {o}LIFO 

A static open condition is a literal that involves a predicate occurring in the initial 
conditions of a planning problem, but not in the effects of any operator in the planning 
domain. This means that a static open condition always has to be linked to the initial 
conditions, and the initial conditions consist solely of ground literals. Resolving a static 
open condition — m will therefore cause all free variables of q to be bound to specific 
objects. Resolving static open conditions before other flaws represents a bias towards early 
commitment of parameter bindings. This resembles the search strategy inherent in planners 
using ground actions, but without necessarily committing to bindings for all parameters of 
an action at once. The gain is a reduced branching factor compared to a planner using 
ground actions, and this reduction can compensate for the increased complexity that comes 
with having to keep track of variable bindings. 

Our earlier results (Younes & Simmons, 2002) indicated that despite a reduction in the 
number of generated plans when planning with lifted actions, using ground actions was 
still faster in most domains. In the gripper domain, for example, while using lifted actions 
resulted in less than half the number of generated plans compared to when using ground 
actions, planning with ground actions was still more than twice as fast. We have greatly 
improved the implementation of the planner and the handling of variable bindings since 
then. When using the latest version of VHPOP in the gripper domain, the planner is 
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roughly as fast when planning with lifted actions giving priority to static preconditions as 
when planning with ground actions. 

Local Flaw Selection. By retaining the LIFO order for selecting open conditions achiev- 
able in multiple ways, Schubert and Gerevini (1995) argue that the planner tends to main- 
tain focus on a particular higher- level goal by regression, instead of trying to achieve multiple 
goals in a breadth-first manner. When some of the goals to achieve are independent, main- 
taining focus on a single goal should be beneficial. The problem with a LIFO-based flaw 
selection strategy, however, as pointed out by Williamson and Hanks (1996), is that it is 
highly sensitive to the order in which operator preconditions are specified in the domain 
description. 

It is not necessary, however, to select the most recently added open condition in order 
to keep focus on the achievement of one goal. We can get the same effect by selecting any of 
the open conditions, but restrict the choice to the most recently added action. We therefore 
introduce a new flaw type, "1" , representing local open conditions. A local open condition is 
one that belongs to the most recently added action that still has remaining open conditions. 
We can use any ordering criterion to select among local open conditions. Using this new 
flaw type, we can specify a local variant of LCFR: 

LCFR-Loc {n, s, 1}LR 

One would expect such a strategy to be less sensitive to precondition-order than a sim- 
ple LIFO-based strategy. We can see evidence of this in Table 5, which also shows that 
the maintained goal focus achieved by local flaw selection strategies can help solve more 
problems compared to a global flaw selection strategy. 

Heuristic Flaw Selection. Distance based heuristics have been used extensively for 
ranking plans in state space planners (e.g., HSP and FF). Nguyen and Kambhampati 
(2001) show that these heuristics can be very useful for ranking plans in POCL planners as 
well. They also suggest that the same heuristics could be used in flaw selection methods, 
but do not elaborate further on this subject. 

It is not hard to see, however, how many of the plan rank heuristics could be used for 
the purpose of selecting among open conditions, since they often are based on estimating 
the cost of achieving open conditions as seen in Section 3.1.1. By giving priority to open 
conditions with the highest heuristic cost, we can build plans in a top-down manner from 
the goals to the initial conditions. We call this ordering criterion "MC" (most cost first). 
By using the opposite ordering criterion, "LC" , we would instead tend to build plans in a 
bottom-up manner. Note that these two ordering criteria only apply to open conditions, 
and not to threats, so we would need to use them in combination with selection criteria for 
threats. We can define both global and local heuristic flaw selection strategies: 

MC {n,s}LR/{o}MC add 
MC-Loc {n, s}LR / {l}MC add 

The subscript for MC indicates the heuristic function to use for ranking open conditions, 
which in this case is the additive heuristic. 
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Problem 


UCPOP 


LCFR 


LCFR-Loc 


MC 




MC-Loc 


MW 


MW-Loc 




<VH 


n 




n 


cr/|/x| 


n 


<VImI 


n 




n 




n 




n 


DriverLog6 


0.20 


20 


0.01 


20 


0.01 


20 


0.18 


20 


0.23 


20 


0.02 


20 


0.02 


20 


DriverLog7 


0.23 


20 


0.10 


20 


0.32 


20 


0.13 


18 


0.25 


20 


_ 





0.05 


20 


DriverLog8 


0.28 


17 


_ 





0.00 


1 







_ 





_ 





_ 





DriverLog9 


0.62 


7 


0.00 


10 


0.45 


14 







0.01 


20 


_ 





0.01 


20 


DriverLoglO 


0.33 


16 


_ 





0.07 


20 







0.08 


20 







0.08 


20 


ZenoTravel6 


0.27 


20 


0.03 


7 


0.22 


20 







0.00 


20 


_ 





0.00 


20 


ZenoTravel7 


0.23 


8 







0.18 


16 







0.16 


16 







0.16 


16 


ZenoTravel8 


0.29 


11 







0.15 


19 







0.18 


20 







0.18 


20 


ZenoTravel9 


0.22 


17 







0.21 


18 







0.19 


20 







0.19 


20 


ZenoTravellO 


0.26 


18 







0.22 


17 







0.15 


19 







0.15 


19 


Satellite6 


0.20 


19 







0.02 


20 







0.02 


20 







0.02 


20 


Satellite7 


0.54 


9 







0.03 


20 







0.07 


20 







0.07 


20 


Satellite8 


0.35 


8 







0.02 


20 







0.07 


4 







0.07 


4 


Satellite9 


0.34 


7 







0.00 


1 







0.00 


1 







0.00 


1 


SatellitelO 


0.32 


9 







0.01 


20 























Table 5: Relative standard deviation for the number of generated plans (<j/\/j,\) and the 
number of solved problems (n) over 20 instances of each problem with random 
precondition ordered. Low relative standard deviation indicates low sensitivity to 
precondition order. Results are shown for VHPOP using seven different flaw selec- 
tion strategies. A memory limit of 512 Mb was enforced, and h r add with estimated 
effort as tie-breaker was used as plan ranking heuristic. 



In Section 3.1.3, we proposed that we can estimate the remaining planning effort for 
an open condition by counting the total number of open conditions that would arise while 
resolving the open condition. This heuristic could also be useful for ranking open conditions, 
and is often more discriminating than an ordering criterion based on heuristic cost. We 
therefore define two additional ordering criteria: "MW" (most work first) and "LW". With 
these, we can define additional flaw selection strategies: 

MW {n,s}LR/ {o}MW add 
MW-Loc {n, s}LR / {l}MW add 

For the planning problems listed in Table 5, we can see that MW-Loc is at most as sensitive 
to precondition order as MC-Loc, with MW-Loc never performing worse than MC-Loc and 
for the first two problems performing clearly better. 

IxTeT (Ghallab Sz Laruelle, 1994; Laborie & Ghallab, 1995) also uses heuristic tech- 
niques to guide flaw selection, but in quite a different way than suggested here. It is our 
understanding of the IxTeT heuristic that it estimates, for each possible refinement r re- 
solving a flaw, the amount of change (commitment) that would result from applying r to 
the current plan. For open conditions, this estimate is obtained by expanding a tree of 
subgoal decomposition, which in principal is a regression-match graph (McDermott, 1999). 
This is similar to how heuristic values are computed using the additive heuristic. However, 
IxTeT considers not only the number of actions that need to be added to resolve an open 
condition but also to what degree current variable domains would be reduced and possible 
action orderings restricted. Furthermore, IxTeT uses the heuristic values to choose the 



417 



Younes & Simmons 



flaw in which a single refinement stands out the most as the least "costly" compared to 
other refinements for the same flaw. The intended effect is a reduction in the amount of 
backtracking that is needed to find a solution, although we are not aware of any evaluation 
of the effectiveness of the technique. 

Conflict-Driven Flaw Selection. Common wisdom in implementing search heuristics 
for constraint satisfaction problems, e.g. propositional satisfiability, is to first make decisions 
with maximal consequences, so that inconsistencies can be detected early on, pruning large 
parts of the search space. 

A flaw selection strategy that follows this principle would be to link unsafe open con- 
ditions before other open conditions. We call an open condition unsafe if a causal link to 
that open condition would be threatened. By giving priority to unsafe open conditions, 
the planner will direct attention to possible conflicts/inconsistencies in the plan at an early 
stage. We introduce the flaw type "u" representing unsafe open conditions. Examples of 
conflict-driven flaw selection strategies using this new flaw type are the following variations 
of LCFR, LCFR-Loc, and MW-Loc: 

LCFR-Conf {n, s, u}LR / {o}LR 
LCFR-Loc-Conf {n, s, u}LR / {1}LR 
MW-Loc-Conf {n, s}LR / {u}MW add / {l}MW add 

The first two of these conflict-driven strategies are very effective in the link-chain domain 
constructed by Veloso and Blythe (1994). The link-chain domain is an artificial domain 
specifically constructed to demonstrate the weakness of POCL planners in certain domains. 
What makes the domain hard for SNLP and UCPOP with their default flaw selection 
strategies is that open conditions can be achieved by several actions but with only one 
action being the right choice because of negative interaction. This forces the POCL planner 
to backtrack excessively over link commitments, but inconsistencies may not be immediately 
detected because of the many link alternatives. We can see in Figure 1 that VHP OP using 
the UCPOP flaw selection strategy performs very poorly in the link-chain domain. Using a 
more sophisticated flaw selection strategy such as LCFR improves performance somewhat. 
However, with the two conflict-driven flaw selection strategies all problems are solved in 
less than a second. The number of generated and explored plans is in fact identical for 
LCFR-Conf and LCFR-Loc-Conf, but LCFR-Loc-Conf is roughly twice as fast as LCFR- 
Conf because of reduced overhead. This demonstrates the benefit of local flaw selection 
strategies. Note, however, that LCFR is faster than LCFR-Loc in the link-chain domain, 
so local strategies are not always superior to global strategies. 

We can also see in Table 1 that conflict-driven flaw selection strategies work well in the 
DriverLog and Depots domains, both with /i add and h r add as heuristic function for ranking 
plans. 

4. Temporal POCL Planning 

In classical planning, actions have no duration: the effects of an action are instantaneous. 
Many realistic planning domains, however, require actions that can overlap in time and 
have different duration. The version of the planning domain definition language (PDDL), 
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Figure 1: Average planning time over ten problems for each point with different flaw selec- 
tion strategies in the link-chain domain. Results are shown for VHPOP using 
five different flaw selection strategies. Only points for which a strategy solved 
all ten problems without running out of memory (512 Mb) are shown. The /if 
heuristic was used to rank plans. 



PDDL2.1, that was used for IPC3 introduces the notion of durative actions. A durative 
action represents an interval of time, and conditions and effects can be associated with 
either endpoint of this interval. Durative actions can also have invariant conditions that 
must hold for the entire duration of the action. 

We use the constraint-based interval approach to temporal POCL planning described by 
Smith et al. (2000), which in essence is the same approach as used by earlier temporal POCL 
planners such as DEVISER (Vere, 1983), ZENO (Penberthy & Weld, 1994), and IxTeT 
(Ghallab & Laruelle, 1994). Like IxTeT, we use a simple temporal network (STN) to record 
temporal constraints. The STN representation allows for rapid response to temporal queries. 
ZENO, on the other hand, uses an integrated approach for handling both temporal and 
metric constraints, and does not make use of techniques optimized for temporal reasoning. 
The following is a description of how VHPOP handles the type of temporal planning 
domains expressible in PDDL2.1. 

When planning with durative actions, we substitute the partial order O in the repre- 
sentation of a plan with an STN T. Each action aj of a plan, except the dummy actions 
ao and Oqo, is represented by two nodes (start time) and % (end time) in the STN 

T, and T can be compactly represented by the d-graph (Dechter et al., 1991). The d-graph 
is a complete directed graph, where each edge ti — > tj is labeled by the shortest temporal 
distance, dij, between the two time nodes ti and tj (i.e. tj — U < dij). An additional time 
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Figure 2: Matrix representation of d-graph, with e = 1, for STN after (a) adding action 
a\ with duration constraint 5\ < 7 A S\ > 3, (b) adding action 02 with duration 
constraint 82 = 4, and (c) ordering the end of 02 before the end of a\. Explicitly 
added temporal constraints are in boldface. 



point, to, is used as a reference point to represent time zero. By default, dij = 00 for all 
i 3 (da = 0), signifying that there is no upper bound on the difference tj — U. 

Constraints are added to T at the addition of a new action, the linking of an open 
condition, and the addition of an ordering constraint between endpoints of two actions. 

The duration, <5j, of a durative action ai is specified as a conjunction of simple duration 
constraints 8i M c, where c is a real-valued constant and txi is in the set {=, <, >}. 4 Each 
simple duration constraint gives rise to temporal constraints between the time nodes t2i-i 
and t2i of T when adding ai to a plan {A, C,T,B). The temporal constraints, in terms of 
the minimum distance dij between two time points, are as follows: 



Duration Constraint 


Temporal Constraints 


5i = c 


d2i-i,2i = c and c?2i,2i— 1 = — c 


5i <c 




5i>c 


d2i,2i-l < — C 



The semantics of PDDL2.1 with durative actions dictates that every action be scheduled 
strictly after time zero. Let e denote the smallest fraction of time required to separate two 
time points. To ensure that an added action ai is scheduled after time zero, we add the 
temporal constraint d2i-i,o ^ ~ e m addition to any temporal constraints due to duration 
constraints. Figure 2(a) shows the matrix representation of the d-graph after adding an 
action, a±, with duration constraint 5± < 7 A 5i > 3 to a null plan. The rows and columns 
of the matrix correspond to time point 0, the start of action a±, and the end of action a\ 
in that order. After adding action 02 with duration constraint 82 = 4, we have the d-graph 
represented by the matrix in Figure 2(b). The two additional rows and columns correspond 
to the start and end of action 02 in that order. 

A temporal annotation r € {s, i, e} is added to the representation of open conditions. 

The open condition — ► represents a condition that must hold at the start of the durative 

action ai, — >a« represents a condition that must hold at the end of ai, while — >ai is an 
invariant condition for a«. An equivalent annotation is added to the representation of causal 

4. In contrast, Vere's DEVISER can only handle duration constraints of the form Si — c. 
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links. The linking of an open condition — ► a« to an effect associated with a time point tj 
gives rise to the temporal constraint d^j < — e (k = 2i if r = e, else k = 2i — 1). Figure 2(c) 
shows the representation of the STN for a plan with actions a\ and 02, as before, and with 
an effect associated with the end of ai linked to a condition associated with the end of a±. 

Unsafe causal links are resolved in basically the same way as before, but instead of 
adding ordering constraints between actions we add temporal constraints between time 
points ensuring that one time point precedes another time point. We can ensure that time 
point ti precedes time point tj by adding the temporal constraint dji < — e. 

Every time we add a temporal constraint to a plan, we update all shortest paths dij 
that could have been affected by the added constraint. This propagation of constraints can 
be carried out in 0(|*4| 2 ) time. 

Once a plan without flaws is found, we need to schedule the actions in the plan, i.e. assign 
a start time and duration for each action. A schedule of the actions is a solution to the STN 
T, and a solution assigning the earliest possible start time to each action is readily available 
in the d-graph representation. The start time of action a% is set to —^21-1,0 (Corollary 3.2, 
Dechter et al., 1991) and the duration to ^21-1,0 — ctai,o- Assuming Figure 2(c) represents 
the STN for a complete plan, then we would schedule a\ at time 1 with duration 5 and ai 
at time 1 with duration 4. We can easily verify that this schedule is indeed consistent with 
the duration constraints given for the actions, and that 02 ends before a± as required. 

Each non-durative action can be treated as a durative action of fixed duration 0, with 
preconditions associated with the start time, effects associated with the end time, and 
without any invariant conditions. This allows for a frictionless treatment of domains with 
both durative and non-durative actions. 

Let us for a moment consider the memory requirements for temporal POCL planning 
compared to classical POCL planning. When planning with non-durative actions, we store 
O as a bit-matrix representing the transitive closure of the ordering constraints in O. For a 
partial plan with n actions, this requires n 2 bits. With n durative actions, on the other hand, 
we need roughly 4n 2 floating-point numbers to represent the d-graph of T. Each floating- 
point number requires at least 32 bits on a modern machine, so in total we need more than 
100 times as many bits to represent temporal constraints as regular ordering constraints 
for each plan. We note, however, that each refinement changes only a few entries in the 
d-graph, and by choosing a clever representation of matrices we can share storage between 
plans. The upper left 3x3 sub-matrix in Figure 2(b) is for example identical to the matrix 
in Figure 2(a). The way we store matrices in VHPOP allows us to exploit this commonality 
and thereby reduce the total memory requirements. 

The addition of durative actions does not change the basic POCL algorithm. The 
recording of temporal constraints and temporal annotations can be handled in a manner 
transparent to the rest of the planner. The search heuristics described in Section 3, although 
not tuned specifically for temporal planning, can be used with durative actions. We only 
need to slightly modify the definition of literal and action cost in the additive heuristic 
because of the temporal annotations associated with preconditions and effects of durative 
actions. Let QA s {q) denote the set of ground actions achieving q at the start, and QA c (q) 
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Name 



Specification 



MW-Loc 
MW-Loc-Conf 
LCFR-Loc 
LCFR-Loc-Conf 



{n,s}LR/{l}MW add 

{n, s}LR / {u}MW add / {l}MW add 

{n, s, 1}LR 

{n, s, u}LR / {1}LR 



Table 6: Flaw selection strategies used by VHPOP at IPC3. 



the set of ground actions achieving q at the end. We define the cost of the literal q as 



Prec s (a) is a propositional formula representing the invariant preconditions of a and pre- 
conditions associated with the start of a, while Prec e {a) is a formula representing all pre- 
conditions of a. 



VHPOP allows for several flaw selection strategies to be used simultaneously in a round- 
robin scheme. This lets us exploit the strengths of different flaw selection strategies con- 
currently, which was essential for the success of VHPOP at IPC3 since we have yet to find 
a single superior flaw selection strategy that dominates all other flaw selection strategies 
in terms of the number of solved problems within a given time frame. The technique we 
use in VHPOP for supporting multiple flaw selection strategies is in essence the same as 
the technique proposed by Howe et al. (1999) for exploiting performance benefits of several 
planners at once in a meta-planner. Although the meta-planner is slower than the fastest 
planner on any single problem, it can solve more problems than any single planner. 

We used four different flaw selection strategies at IPC3 (Table 6), preferring local flaw 
selection strategies as they tend to incur a lower overhead than global strategies such as 
LCFR and MW and often appear more effective than global strategies because of a main- 
tained focused on subgoal achievement. The four strategies were selected after some initial 
experimentation with problems from a few of the competition domains. 

The use of multiple flaw selection strategies can be thought of as running multiple 
concurrent instances of the planner, as a separate search queue is maintained for each 
flaw selection strategy that is used. Similar to HSP2.0 at the planning competition in 
2000 (Bonet & Geffner, 2001a), we use a fixed control strategy to schedule these multiple 
instances of our planner. The first time a flaw selection strategy is used, it is allowed to 
generate up to 1000 search nodes. The second time the same flaw selection strategy is 
used, it can generate another 1000 search nodes, making it a total of 2000 search nodes. 
At each subsequent round i, each flaw selection strategy is permitted to generate up to 




with t € {s, e} and the cost of a durative action a at endpoint t defined as 



h ad d(a@t) = 1 + /i add (Prec t (a)). 



5. VHPOP at IPC3 
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Name 


Order 


STRIPS Limit 


Durative Limit 


MW-Loc 


1 


10000 


12000 


MW-Loc-Conf 


2 


100000 


100000 


LCFR-Loc 


3 


200000 


240000 


LCFR-Loc-Conf 


4 


oo 


oo 



Table 7: Execution order of flaw selection strategies used at IPC3, and also search limits 
used with each strategy on domains with and without durative actions. 



1000 • 2 i ~ 2 additional nodes. The maximum number of nodes generated using a specific flaw 
selection strategy is 1000 • 2*" 1 after i rounds. An optional upper limit on the number of 
generated search nodes can be set for each flaw selection strategy. This is useful for flaw 
selection strategies that typically solve problems quickly, when they solve them at all within 
reasonable time. Table 7 shows the search limits used by VHPOP at IPC3. These limits 
were determined after some initial trials on the competition problems. Note that there was 
no set search limit for the last flaw selection strategy. Whenever the other three strategies 
all reached their search limits without having found a solution, LCFR-Loc-Conf was used 
until physical resource limits were reached. 

Table 8 shows the number of plans generated in the STRIPS Satellite domain before a 
solution is found for the four flaw selection strategies used at IPC3, and also the number 
of generated plans when combining the four strategies using the schedule in Table 7. To 
better understand how the round-robin scheduling works, we take a closer look at the 
numbers for problem 15. Table 9 shows how the total number of generated plans is divided 
between rounds and flaw selection strategies. Note that although MW-Loc is actually the 
best strategy for this problem, it is stopped already in round 5. The total number of 
generated plans does not exactly match the actual number of generated plans reported in 
Table 8. This is because we only consider suspending the use of a flaw selection strategy 
after all refinements of the last selected plan have been added, so the limit in a round can 
be exceeded slightly in practice. The numbers in Table 9 represent an idealized situation 
where flaw selection strategies are switched when the number of generated plans exactly 
matches the limit for the current round. 

VHPOP solved 122 problems out of 224 attempted at IPC3. The quality of the plans, 
in terms of number of steps, generated by VHPOP was generally very high. For plain 
STRIPS domains, VHPOP's plans were typically within 10 percent of the best plans found 
by any planner in the competition, with 28 of VHPOP's 68 STRIPS plans being at least as 
short as the best plans found and, being a POCL planner, VHPOP automatically exploits 
parallelism in planning domains, generating plans for STRIPS domains with low total plan 
execution time (Table 10). Table 11 shows that VHPOP also performed well in terms of 
number of solved problems in four of the six STRIPS domains, being competitive with top 
performers such as MIPS and LPG (particularly in the Rovers domain). 
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Problem 


MW-Loc 


MW-Loc-Conf 


LCFR-Loc 


LCFR-Loc-Conf 


All 


1 


118 


118 


118 


118 


118 


2 


229 


229 


249 


249 


229 


3 


172 


172 


172 


172 


172 


4 


738 


843 


822 


1797 


738 


5 


448 


723000f 


1018 


7060001 


448 


6 


6360001 


629000f 


720 


834 


2727 


7 


571 


745 


620 


6880001 


571 


8 


482000f 


874 


1017 


783 


1874 


10 


1245 


1178 


1323 


1275 


4283 


11 


1172 


1172 


1172 


1172 


4172 


12 


3517 


3733 


525000f 


5250001 


9542 


13 


6241 


3820001 


559000f 


5440001 


18265 


14 


2352 


2352 


2157 


2157 


8365 


15 


74738 


444000| 


107375 


465000t 


281387 


16 


533000| 


529000| 


3442 


3571 


13471 


17 


2975 


2975 


3438 


3438 


8981 


18 


1584 


1584 


1724 


1724 


4588 



Table 8: Number of generated plans in the STRIPS Satellite domain for the four different 
flaw selection strategies used by VHPOP at IPC3. The rightmost column is the 
number of plans generated by VHPOP before finding a solution when using the 
schedule in Table 7. A dagger (f) means that the planner ran out of memory 
(800 Mb) after generating at least the indicated number of plans. 



Round 


MW-Loc 


MW-Loc-Conf 


LCFR-Loc 


LCFR-Loc-Conf 


Total 


1 


1000 


1000 


1000 


1000 


4000 


2 


1000 


1000 


1000 


1000 


4000 


3 


2000 


2000 


2000 


2000 


8000 


4 


4000 


4000 


4000 


4000 


16000 


5 


2000 


8000 


8000 


8000 


26000 


6 




16000 


16000 


16000 


48000 


7 




32000 


32000 


32000 


96000 


8 




36000 


43375 




79375 


Total 


10000 


100000 


107375 


64000 


281375 



Table 9: A closer look at the round- robin scheduling for problem 15 in the STRIPS Satellite 
domain. Italic entries indicate that the search limit for a flaw selection strategy 
was reached in the round. 
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Domain 


# Solved 


# Steps 


# Best 


Execution Time 


# Best 


DriverLog 


14 


1.09 


5 


1.15 


4 


ZenoTravel 


13 


1.04 


7 


1.20 


5 


Satellite 


17 


1.07 


7 


1.25 


5 


Rovers 


20 


1.08 


7 


1.08 


13 



Table 10: Relative plan quality for the STRIPS domains where VHPOP solved more than 
half of the problems. There are two plan quality metrics. Number of steps is 
simply the total number of steps in a plan, while execution time is the total 
time required to execute a plan (counting parallel actions as one time step). The 
table shows the average ratio of VHPOP's plan quality and the quality of the 
best plan generated by any planner, and the number of problems in each domain 
where VHPOP found the best plan is also shown. 



Planner 


Depots 


DriverLog 


ZenoTravel 


Satellite 


Rovers 


FreeCell 


Total 


FF 


22 


15 


20 


20 


20 


20 


117 


LPG 


21 


18 


20 


20 


12 


18 


109 


MIPS 


10 


15 


16 


14 


12 


19 


86 


SlMPLANNER 


22 


11 


20 


17 


9 


12 


91 


Stella 


4 


10 


18 


14 


4 





50 


VHPOP 


3 


14 


13 


17 


20 


1 


68 



Table 11: Number of problems solved by top performing fully automated planners in 
STRIPS domains. 



In domains with durative actions , total execution time was given as an explicit plan 
metric, and the objective was to minimize this metric. The specification of an explicit plan 
metric is a feature of PDDL2.1 not present in earlier versions of PDDL. As VHPOP cur- 
rently ignores this objective function and always tries to find plans with few steps, it should 
come as no surprise that the quality of VHPOP's plans for domains with durative actions 
was significantly worse than the quality of the best plans found (Table 12). 6 VHPOP 
still produced plans with few steps, however, with over 60 percent of VHPOP's plans for 
domains with durative actions having the fewest steps. The plan selection heuristic that 
VHPOP uses is tuned for finding plans with few steps, and it would need to be modified in 
order to find plans with shorter total execution time. Table 13 shows that LPG solved by 
far the most problems in domains with durative actions, but that VHPOP was competitive 
with MIPS and clearly outperformed TP4 and TPSYS. 

5. There were two types of domains with durative actions at IPC3: "SimpleTime" domains having actions 
with constant duration and "Time" domains with durations being functions of action parameters. The 
results with durative actions presented in this paper are for "SimpleTime" domains as there is currently 
no support for durations as functions of action parameters in VHPOP. It would, in principle, not be 
hard to add such support though, and we expect future versions of VHPOP to have it. 

6. The poor performance is in part also due to the use of 1 for e (see Section 4) in VHPOP, while most 
other planners used 0.01 or less. Using 0.01 for e with VHPOP reduces the total execution time of plans 
with about 15 percent. 
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Domain 


# Solved 


# Steps 


# Best 


Execution Time 


# Best 


DriverLog 


14 


1.04 


8 


1.50 





ZenoTravel 


13 


1.04 


10 


1.54 





Satellite 


17 


1.04 


9 


2.51 





Rovers 


7 


1.04 


5 


1.39 






Table 12: Same information as in Table 10, but for domains with durative actions. 



Planner 


Depots 


DriverLog 


ZenoTravel 


Satellite 


Rovers 


Total 


LPG 


20 


20 


20 


20 


12 


92 


MIPS 


11 


15 


14 


9 


9 


58 


TP4 


1 


2 


5 


3 


4 


15 


TPSYS 





2 


2 


2 


4 


10 


VHPOP 


3 


14 


13 


17 


7 


54 



Table 13: Number of problems solved by fully automated planners in domains with durative 
actions. 



While VHPOP was a top performer at IPC3 in terms of plan quality, it was far from 
the top in terms of planning time. VHPOP was typically orders of magnitude slower than 
the fastest planner. The high planning times for VHPOP can in part be attributed to 
implementation details. Improvements to the code (e.g. using pointer comparison instead 
of string comparison whenever possible) since the planning competition has resulted in 10 to 
20 percent lower planning times when using ground actions and when using lifted actions the 
planner is more than twice as fast as before. The reachability analysis is still a bottleneck, 
however, and further improvements could definitely be made there. It is important to 
remember, though, that we basically run four planners at once by using four flaw selection 
strategies concurrently. Table 14 shows the average relative performance of VHPOP at 
IPC3 compared to the performance of VHPOP using only the best flaw selection strategy 
for each problem. VHPOP with the best flaw selection strategy is on average two to three 
times faster than VHPOP with four concurrent strategies. Using several flaw selection 
strategies simultaneously helps us solve more problems, but the price is reduced speed. By 
more intelligently scheduling the different flaw selection strategies depending on domain and 
problem features, and not just using a fixed schedule for all problems, we could potentially 
increase planner efficiency significantly. 

6. Discussion 

McDermott (2000) finds the absence of POCL planners at the first planning competition in 
1998 striking, as such planners had been dominating planning research just a few years ear- 
lier. "It seems doubtful that the arguments in [POCL planners'] favor were all wrong, and it 
would be interesting to see partial-order planners compete in future competitions" , McDer- 
mott writes. After two competitions without POCL planners, we believe that VHPOP's 
performance at IPC3 demonstrates that POCL planning — at least with ground actions — 
can be competitive with CSP-based and heuristic state space planning. VHPOP also shows 
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Domain 



STRIPS Durative 



DriverLog 
ZenoTravel 
Satellite 
Rovers 



2.52 2.66 

2.76 2.86 

1.78 2.01 

2.32 3.37 



Table 14: Each number in the table represents the average ratio of the planning time for 
VHPOP using all four flaw selection strategies concurrently and the planning 
time for VHPOP with only the fastest flaw selection strategy. 



that temporal POCL planning can be made practical by using the same heuristic techniques 
that have been developed for classical planning. The idea of using the POCL paradigm for 
temporal planning is not new and goes back at least to Vere's DEVISER (Vere, 1983), but 
we are the first to demonstrate the effectiveness of temporal POCL planning on a larger set 
of benchmark problems. 

We hope that the success of VHPOP at IPC3 will inspire a renewed interest in plan 
space planning, and we have made the source code for VHPOP, written in C++, available 
to the research community in an online appendix so that others can build on our effort. 7 

While VHPOP performed well above our expectations at IPC3, we see several ways in 
which we can further improve the planner. Speed, as mentioned in Section 5, is the principal 
weakness of VHPOP. The code for the reachability analysis is not satisfactory, as it cur- 
rently generates ground action instances before performing any reachability analysis. This 
often leads to many ground action instances being generated that do not have preconditions 
with finite heuristic cost (according to the additive heuristic). We believe that VHPOP 
could profit from code for reachability analysis in well-established planning systems such 
as FF. We could also improve speed by better scheduling different flaw selection strategies. 
We would like to see statistical studies, similar to that of Howe et al. (1999), linking domain 
and problem features to the performance of various flaw selection strategies. 

We have so far only considered using different flaw selection strategies. However, run- 
ning multiple instances of VHPOP using different plan selection heuristics could be equally 
interesting. We have, for example, noticed that using the additive heuristic without account- 
ing for reuse helps us solve two more problems in the Satellite domain. It would also be 
interesting to have the FF heuristic implemented in VHPOP and see how well it performs 
in a plan space planner, possibly using local search techniques instead of A*. It is not likely, 
however, that the results on local search topology for the FF heuristic in state space (Hoff- 
mann, 2001) carry over to plan space. While many of the benchmark planning domains 
contain actions whose effects can be undone by other actions, the plan operators causing 
transitions in the search space of a plan space planner are different from the actions defined 
for a planning domain, and the effects of a plan space operator are generally irreversible. 
We would likely need to add transformational plan operators that can undo linking and 
ordering decisions. Incidentally, VHPOP started out as a project for adding transforma- 
tional plan operators to UCPOP, but we got side-tracked by the need for better search 

7. The latest version of VHPOP is available for download at www.cs.cmu.edu/~lorens/vhpop.litml. 
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control, and our research on transformational POCL planning was suspended. With the 
recent improvements in search control for POCL planners, it may be worthwhile to once 
again consider adding transformational plan operators. 

In addition to considering different search control heuristics, we could also have instances 
of VHPOP working with lifted actions instead of ground actions. Recent improvements 
to the code have significantly reduced the overhead for maintaining binding constraints, 
making planning with lifted actions look considerably more favorable than was reported in 
earlier work (Younes &; Simmons, 2002). Planning with lifted actions could be beneficial for 
problems with a high branching factor in the search space due to a large number of objects. 

We would also like to see support for numeric effects and preconditions in future versions 
of VHPOP. This would make VHPOP fully compatible with PDDL2.1. We have also 
mentioned the need for plan ranking heuristics better tailored for temporal planning, so 
that VHPOP's performance in terms of plan execution time for domains with durative 
actions can approach the performance of the best temporal planners at IPC3. 
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